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\0 ' Abstract 

We consider the problem of multicasting information from a source to a set of receivers over a 
network where intermediate network nodes perform randomized network coding operations on the source 
packets. We propose a channel model for the non-coherent network coding introduced by Koetter and 
Kschischang in [6], that captures the essence of such a network operation, and calculate the capacity as 
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a function of network parameters. We prove that use of subspace coding is optimal, and show that, in 
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dimensions depend on the packet length. This model and the results also allow us to give guidelines 
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on when subspace coding is beneficial for the proposed model and by how much, in comparison to a 
coding vector approach, from a capacity viewpoint. We extend our results to the case of multiple source 
multicast that creates a virtual multiple access channel 
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I. Introduction 

The network coding techniques for information transmission in networks introduced in 12 have 
attracted significant interest in the literature, both because of posing theoretically interesting questions, 
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as well as because of potential impact in applications. The first fundamental result proved in network 
coding, and perhaps still the most useful from a practical point of view today, is that, using linear network 
coding O, J3J, one can achieve rates up to the common min-cut value when multicasting to iV r > 1 
receivers. In general this may require operations over a field of size approximately y/Nr, which translates 
to communication using packets of length \ log N r bits 0]. 

However, this result assumes that the receivers know perfectly the operations that the network nodes 
perform. In large dynamically changing networks, collecting network information comes at a cost, as it 
consumes bandwidth that could instead have been used for information transfer. In practical networks, 
where such deterministic knowledge is not sustainable, the most popular approach is to perform random- 
ized network coding [5] and to append coding vectors at the headers of the packets to keep track of 
the linear combinations of the source packets they contain (see, e.g., Ifi2l ). The coding vectors have an 
overhead of h log N r bits, where h is the total number of packets to be linearly combined. This results 
in a loss of information rate that can be significant with respect to the min-cut value. In particular, in 
wireless networks such as sensor networks where communication is restricted to short packet lengths, 
the coding vector overhead can be a significant fraction of the overall packet length E71 . fT3l . 

Use of coding vectors is akin to use of training symbols to learn the transformation induced by a 
network. A different approach is to assume a non-coherent scenario for communication, as proposed 
in 0, where neither the source(s) nor the receiver(s) have any knowledge of the network topology 
or the network nodes operations. Non-coherent communication allows for creating end-to-end systems 
completely oblivious to the network state. Several natural questions arise considering this non-coherent 
framework: (i) what are the fundamental limits on the rates that can be achieved in a network where the 
intermediate node operations are unknown, (ii) how can they be achieved, and (iii) how do they compare 
to the coherent case. 

In this work we address such questions for two different cases. First, we consider the scenario where 
a single source aims to transmit information to one or multiple receiver(s) over a network under the non- 
coherence assumption using fixed packet length. Because network nodes only perform linear operations, 
the overall network behavior from the source(s) to a receiver can be represented as a matrix multiplication 
of the sent source packets. We consider operation in time-slots, and assume that the channel transfer 
matrices are distributed uniformly at random and i.i.d. over different time-slots. Under this probabilistic 
model, we characterize the asymptotic capacity behavior of the introduced channel and show that using 
subspace coding we can achieve the optimal performance. We extend our model for the case of multiple 
sources and characterize the asymptotic behavior of the optimal rate region for the case of two sources. We 
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believe that this result can be extended to the case of more than two sources using the same method that 
is applied in $V] For the multi-source case we prove as well that encoding information using subspaces 
is sufficient to achieve the optimal rate region. 

The idea of non-coherent modeling for randomized network coding was first proposed in the seminal 
work by Koetter and Kschischang in In that work, the authors focused on algebraic subspace code 
constructions over a Grassmannian. Independently and in parallel to our work in |9), Montanari et al. 
fffl introduced a different probabilistic model to capture the end-to-end functionality of non-coherent 
network coding operation, with a focus on the case of error correction capabilities. Their model does not 
examine subsequent time slots, but instead, allows the packets block length (in this paper terminology; 
packet length T) to increases to infinity, with the result that the overhead of coding vectors becomes 
negligible, very fast. 

Silva et al. lfl6l independently and subsequent to our works in Q and ifTOll . also considered a 
probabilistic model for non-coherent network coding, which is an extension of the model introduced 
in Ifl4l over multiple time-slots. In their model the transfer matrix is constrained to be square as well 
as full rank. This is in contrast to our model, where the transfer matrix can have arbitrary dimensions, 
and the elements of the transfer matrix are chosen uniformly at random, with the result that the transfer 
matrix itself may not have full rank (this becomes more pronounced for small matrices). Moreover, we 
extend our work to multiple source multicast, which corresponds to a virtual non-coherent multiple access 
channel (MAC). Our results coincide for the case of a single source, when the packet length and the 
finite field of operations are allowed to grow sufficiently large. Another difference is that the work in 
lfl6l focuses on additive error with constant dimensions; in contrast, we focus on packet erasures. 

An interpretation of our results is that it is the finite field analog of the Grassmannian packing result for 
non-coherent MIMO channels as studied in the well known work in |[19ll . In particular, we show that for 
the non-coherent model over finite fields, the capacity critically depends on the relationship between the 
"coherence time" (or packet length T in our model) and the min-cut of the network. In fact the number of 
active subspace dimensions depend on this relationship; departing from the non-coherent MIMO analogy 
of®. 

The paper is organized as follows. We define our notation and channel model in §111 we state and 
discuss our main results in E fllfl we prove the capacity results for the single and multiple sources in 
sections ^IVI and <JV] respectively; and conclude the paper in §VI| 

All the missing proofs for lemmas, theorems, and etc., are given in Appendix |A] unless otherwise 
stated. 
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II. Channel Model and Notation 

A. Notation 

We here introduce the notation and definitions we use in the following sections. Let q > 2 be a power 
of a prime. In this paper, all vectors and matrices have elements in a finite field ¥ q . We use F™ xm to 
denote the set of all n x m matrices over ¥ q , and to denote the set of all row vectors of length T. 
The set F^ forms a T-dimensional vector space over the field ¥ g . 

Throughout the paper, we use capital letters, e.g., X, to denote random objects, including random 
variables, random matrices, or random subspaces, and corresponding lower-case letters, e.g., x to denote 
their realizations. For example, we denote by IT a "random subspace" which takes as values the subspaces 
in a vector space according to some distribution, and by it a specific realization. Also, bold capital 
letters, e.g., A, are reserved for deterministic matrices and bold lower-case letters, e.g., v, are used for 
deterministic vectors. 

For subspaces 7ri and 7T2, tt\ C tt2 denotes that tt\ is a subspace of tt2- Recall that for two subspaces 
7Ti and 7T2, 7Ti n 7T2 is the intersection of these subspaces which itself is a subspace. We use tti + TT2 to 
denote the smallest subspace that contains both tt\ and 7T2, namely, 

TTl + 7T2 = {Vi + V 2 |vi G 7T1, V 2 G 7T 2 } . 

It is well known that 

dim(-7Ti + 7T2) = dim(-7Ti) + dim(7T2) — dim(-7ri n 7^). 

For a set of vectors {vi, . . . , v^} we denote their linear span by (vi, . . . , v^}. For a matrix X, (X) 
is the subspace spanned by the rows of X and (X) c is the subspace spanned by the columns of X. We 
then have rank(X) = dim((X)) = dim((X) c ). 

We use the calligraphic symbols, i.e., X or y to denote a set of matrices. To denote a set of subspaces 
we use the same calligraphic symbols but with a i.e., X or 3^- 

We use the symbols 'V and to denote the element-wise inequality between vectors and matrices 
of the same size. 

For two real valued functions f(x) and g(x) of x, we use f(x) = g(x) to denote thaj^l 

lim 1 TT ^ L 

2 One has to specify the growing variable whenever "=" is used for multi-variate functions. However, since in this work the 
growing variable is always q, the field size, we will not repeat it for sake of brevity. 
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Note that the definition of "=" is different from the more standard definition which is lim^oo ^ log — > 
We also use a similar definition for / < g to denote that 

lim i TT c - 1' 

x->oo log(?(x) 

where c is a constant. 

We use the big-0 notation which is defined as follows. Let f(x) and g(x) be two functions defined 
on some subset of the real numbers. We write f(x) = O (g(x)) as x — > oo, if there exists a positive real 
number M and a real number xq such that \f(x)\ < M\g(x)\ for all x > xq. For the little o notation 
we use the following definition. We write f(x) = o(g(x)) as x — > oo, if for all e > there exists a real 
number xo such that < e • \g(x)\ for all x > xq. We use also the big-0 notation which is defined 

as follows. We write f(x) = Q, (g(x)) as x — > oo, if we have g{x) = O (f(x)) as x — > oo. Finally, we 
use the big-0 notation to denote that a function is bounded both above and below by another function 
asymptotically. Formally, we write f(x) = (g(x)) as x — > oo, if and only if we have f(x) = O (g(x)) 
and f(x) = (g(x)) as x — > oo. 

Definition 1 (Grassmannian and Gaussian coefficient H22\l , H25\l ): The Grassmannian Gr(T, d) q is the 
set of all ci-dimensional subspaces of the T-dimensional space over a finite field ¥ q , namely, 

Gr(T, d) q = {it C : dim(^) = d}. 

The cardinality of Gr(T, d) q is the Gaussian coefficient, namely, 

d\ q ~ " (^-l)---(g-l) ' (1) 

Definition 2 (The set Sp(T, m) g j: We define Sp(T, m) g to be the set (sphere) of all subspaces of 
dimension at most m in the T-dimensional space ¥ T , namely 

min[m,T] 

Sp(T, m) q = \J Gr(T, d), = {ttQ¥^ : dim(^) < min[m, T]}. 

d=0 

The cardinality of Sp(T, m) q equals 

min[m,T] 

S(T,m) q = |Sp(r,m)g| = ^ |Gr(T,d) g |. 

Definition 3 (The number ip(T, n, TT<i)q )■' We denote by ip(T, n, TTd) q the number of different n x T 
matrices with elements from a field F 9 , such that their rows span a specific subspace ira E F^ of 
dimension < d < min[n,T]. 

For simplicity, in the rest of the paper we will drop the subscript q in the previous definitions whenever 
it is obvious from the context. 
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B. Preliminary Lemmas 

We here state some preliminary lemmas related to the definitions introduced in Qll-A\ 
Existing bounds in the literature allow to approximate the Gaussian number, for example, we have 
from |6l Lemma 4] that (23l Section III] 

d(T-d) < 



Q d(T~d) 

< t~ — < Aq d{T - d \ Vd:0<d<T. (2) 



nr=i(i-^) 

Using Definition Q] and © we have Lemma [T] 

Lemma 1: For large q we can approximate the Gaussian number as follows 



id{ T- d){1 + 0{q -l )) ^ q d(T-d) 



T 
d. 

Lemma 2: For ip{T,n,-Kd) given in Definition [3l we have that |[26l 

d-l d-1 

il>(T,n,* d ) = l[(q n - j) = ?G) " 1), 

i=0 i=0 

i.e., it does not depend on T. 

Since ip(T, n, 7r^) does not depend on T, and only depends on ir^ through its dimension, as a shorthand 
notation we will also use t/j(n,d) instead of ijj(T,n,TVd), where d = dim^). 
Using Lemma |2] the following lower and upper bounds are straightforward 



(1 - dq- n+d ~ l ) < (l - Q~ n+l ^j 



tb(n, d) 

< ^d 1 < 1. ( 3 ) 



which imply Lemma [3] (see also 11231 ). 

Lemma 3: For large values of q the following approximation holds 

if}(n,d) =q nd (l + 0(q~ 1 )) = q nd . 

It is also worthwhile to mention that ijj(n, d) [^] is the number of n x T matrices of rank d. We can 
count all the nxT matrices through the following LemmaHl (also see 11221 . ll25l . and ll26l Corollary 5]). 



Lemma 4: For every n > and T > we can write 

min[n,T] 

^ V>(n, d) 



d=0 



T 

d 



-.nT 



where ip(n, 0) = 1. 
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C. The Non-Coherent Finite Field Channel Model 

We consider a network where nodes perform random linear network coding over a finite field ¥ q . We 
are interested in the maximum information rate at which a single (or multiple) source(s) can successfully 
communicate over such a network when neither the transmitter nor the receiver(s) have any channel state 
information (CSI). For simplicity, we will present the channel model and our analysis for the case of a 
single receiver; the extension to multiple receivers (with the same channel parameters) is straightforward, 
as we also discuss in the results section. 

We assume that time is slotted and the channel is block time-varying. For the single source commu- 
nication, at time slot t, the receiver observes 



where X[t] £ F™ xT , G[t] £ ¥ q lXm , and Y[t] 6 F" xT . At each time-slot, the receiver receives n packets 
of length T (captured by the rows of matrix Y[t]) that are random linear combinations of the m packets 
injected by the source (captured by the rows of matrix X[t]). In our model, the packet length T can 
be interpreted as the coherence time of the channel, during which the transfer matrix remains constant. 
Each element of the transfer matrix G[t] is chosen uniformly at random from ¥ q , changes independently 
from time slot to time slot, and is unknown to both the source and the receiver. In other words, the 
channel transfer matrix is chosen uniformly at random from all possible matrices in ¥ q lXm and has i.i.d. 
distribution over different blocks. In general, the topology of the network may impose some constraints 
on the transfer matrix G[t] (for example, some entries might be zero, see O, (8l, |[20l . ll2ll ). However, we 
believe that this is a reasonable general model, especially for large-scale dynamically-changing networks 
where apart from random coefficients there exist many other sources of randomness. Formally, we define 
the non-coherent matrix channel as follows. 

Definition 4 (Non-coherent matrix channel Ch m j: This is defined to be the matrix channel Ch m : 
X — > y described by (0]) with the assumption that G[t] is i.i.d. and uniformly distributed over all 
matrices ¥ q xm . It is a discrete memoryless channel with input alphabet X = ¥ q nxT and output alphabet 



The capacity of the channel Ch m is given by 

C m = max I(X;Y), (5) 

Px(x) 

where Px (x) is the input distribution. To achieve the capacity a coding scheme may employ the channel 
given in (@]) multiple times, and a codeword is a sequence of input matrices from X. For a coding strategy 



Y[t] = G[t]X[t] 



(4) 
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that induces an input distribution Px{x), the achievable rate is 

R = I{X;Y). 

Now we define a non-coherent subspace channel Ch s which takes as an input a subspace and outputs 
another subspace. Then, in Theorem Q] we will show that the two channels Ch m and Ch s are equivalent 
from the point of view of calculating the mutual information between their inputs and their outputs. 

Definition 5 (Non-coherent subspace channel ChJ: This is defined to be the channel Ch s : X — > y 
with input alphabet X = Sp(T, m) and output alphabet y = Sp(T, n) and transition probability 

V>(T,n, % )(T" dim ^) 7r y Qn x , 

(6) 

otherwise, 
where Hx and Ily are the input and output variables of the channel Ch s . 
The capacity of the channel Ch s is given by 

C s = max /(rix;riy), 

where Pn x {^x) is the input distribution denned over the set of subspaces X. 

We next consider a multiple sources scenario, and the multiple access channel corresponding to (0]). 
In this case, we have 

Y[t] = Y,Gi[t]Xi[t], (7) 



i=l 



where N s is the number of sources, and each source i inserts mi packets to the network. Thus, X{ [t] G 
F™- xT , Gi[t] £ F™ xrn ' and Y[t] G F" xT . We can also collect all G t [t] in an n x J2?=i m matrix G MA c[t] 



and all Xi[t] in an Yli=i m i x T mat rix X M ac[*] as following 

X 1 [t] 



XMAc[t] 

so we can rewrite ^} as 



and G M Ac[t] 



Gil*] 



GnM 



Y[t]=G MA c[t]X MAC [t]. 



Each source i then controls mi rows of the matrix X MAC [t]. Again we assume that each entry of the 
matrices Gi[t] is chosen i.i.d. and uniformly at random from the field ¥ q for all source nodes and all 
time instances. 

Definition 6 (The non-coherent multiple access matrix channel Ch m _ MA c)-' This is denned to be the 
channel Ch m . M AC '■ X\X ■ ■ ■ x X^ t — > y described in (0, with the assumption that Gj [t], i = 1, . . . , N s , 



November 17, 2010 



DRAFT 



9 



are i.i.d. and uniformly distributed over all matrices F™*™" , i = 1, . . . , N s . It forms a discrete memoryless 
MAC with input alphabets X t = F™* xT , i = 1, . . . , iV s , and output alphabet y = F£ xT . 
It is well known |fl"5l that the rate region of any multiple access channel including Ch m _ MAC is given by 
the closure of the convex hull of the rate vectors satisfying 

R s < I(X S ; Y\X S *) for all S C {1, . . . , N s }, 

for some product distribution Px 1 (xi) ■ ■ ■ Px N {%N S )- Note that Rs = X^es-^i where Ri is the trans- 
mission rate of the ith source, X$ = {Xi :i£S} and S c is the complement set of S. 

As before, we define a non-coherent subspace version^ of the matrix multiple access channel and in 
Theorem [6] we show that from the point of view of rate region these two channels are equivalent. 

Definition 7 (Non-coherent subspace multiple access channel Ch s _MAc)-' This is defined to be the chan- 
nel C1i s _mac : %i x %2 — > y with input alphabets Xi = Sp(T, rrn), i = 1, 2, output alphabet y = Sp(T, n) 
and transition probability 

' ip(T, n, ir y )q- ndim ^+^ ix y C tti + vr 2 , 



(8) 

otherwise, 



Pr(ily = lt y \VL Xl = 7Tl,rix 2 = 7T 2 ) = 

where and Iix 2 we tne input and Uy is the output variables of the channel Ch s _ M AC- 

III. Main Results 

A. Single Source 

Our main results, Theorem[2]and Theorem[3l characterize the capacity for non-coherent network coding 
for the model given in (01). We show that the capacity is achieved through subspace coding, where the 
information is communicated from the source to the receivers through the choice of subspaces. Formally, 
we have the following results. 

Theorem 1: The matrix channel Ch m : X — > y defined in Definition [4] and the subspace channel 
Ch s : X — > y defined in Definition [5] are equivalent in terms of evaluating the mutual information 
between the input and output. More precisely, for every input distribution for the channel Ch s there is 
an input distribution for the channel Ch m such that I(X;Y) = I(Hx]^y) and vice versa. As a result, 
these channels have the same capacity C m = C s . 

For the proof of Theorem Q] refer to Appendix |A] and for more discussion refer to £)iV-A| 

3 For simplicity, we restrict this definition to only two source nodes. However, generalization to N s sources is straightforward. 
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Theorem 2: For the channel Ch m : X — > y defined in Definition 01 the capacity is given by 

C m = i*(T-i*)log 2 q + o(l), (9) 
where i* = min [m,n, [T/2\], and o(l) tends to zero as q grows. 

Theorem |2] is proved in ^IV-BI The result of Theorem |2] is for large alphabet regimeQ The following 
result, Theorem [3l is valid for a finite field size, and therefore is a non-asymptotic result. 

Theorem 3: Consider the channel Ch m : X — > y defined in Definition |U There exists a finite 
number qo such that for q > qo the optimal input distribution is nonzero only for matrices of rank 
in the set 

A = {min [(T — n) + ,m, n, T] , . . . , min [m, n, T]} . (10) 

Moreover, for all values of q the optimal input distribution is uniform over all matrices X of the same 
rank, and the total probability allocated to transmitting matrices of rank i equals 

a* = P [rank(X) =i] = 2- c V( T ~ i ) [1 + o(l)] , \/i G A. (1 1) 

The proof of Theorem [3] is presented in ^IV-CI and ^IV-D1 and uses standard techniques from convex 
optimization, as well as large field size approximations. Note that, the same coding scheme at the source 
simultaneously achieves the capacity for all receivers with the same channel parameters {i.e., values of 
n, m and T). That is, each receiver is able to successfully decode. 

The result of Theorem [3] for the active set of input dimensions is not asymptotic in q. However, it 
is not easy to analytically find the minimum value of go such that the theorem statement holds for all 
q > qo. Theorem [4] demonstrates how we can analytically characterize qo given in Theorem [3] for the 
case T > n + min[m, n]. The proof of Theorem [4] is presented in ^IV-EI 

Theorem 4: If T > n + min[m, n], then the capacity of Ch m for q > qo is given by 



1=0 




i*(T - i*) log 2 q - l { „< m} (T - i*) 1 ^ + q -l + o(g -l )> (12) 



where lr. \ is the indicator function and qo is the minimum field size that satisfies the set of inequalities 

<log 2 (7o, VZ:0<i<(i*-l), 



e?o(0-e ?o 0*) 



(T-n-i*)(i* - I) 



4 We gratefully acknowledge the contribution of an anonymous reviewer who gave an alternate proof, which focused on the 
asymptotic q regime. We have included that proof in ^IV-BI Our original proof was based partially on the proof now given for 



Theorem [3] 
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and 



i*(l — i*) 



< log 2 qo, VZ : (i* + 1) < I < m, 



where i* = min [m,n] and 



min[n,i] 



A. 



g ni log 2 



rT 



min[n, Z](T — i*). 



The capacity is achieved by sending matrices X such that their rows span different f*-dimensional 
subspaces. 

Moreover, asymptotically in T, we can show that ~ m+1 > 5 m 2 is sufficient for the case m < n and 
Qo > n ^ is sufficient if m > n. 

Theorems [2] and [3] state that the capacity behaves as i*(T — i*) log 2 q, for sufficiently large q. However, 
numerical simulations indicate a very fast convergence to this value as q increases. Fig. Q] depicts the 
capacity for small values of q, calculated using the Differential Evolution toolbox for MATLAB iTTTl . 
This shows that the result is relevant at much lower field size than dictated by the formalism of the 
statement of Theorems |2] and [3] 



40 

5" 30 
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O 20 
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T=13 




T=10 




1=7 



iog 2 q 



8 



10 



Fig. 1. Numerical calculation of the capacity for small values of q and m = 11, n = 7. The dotted line depicts i*(T — i*). 

From Theorem [3j we can derive the following guidelines for non-coherent network code design. 

1 ) Choice of subspaces: The optimal input distribution uses subspaces of a single dimension equal 
to min[m, n] for T > min[m, n] + n. As T reduces, the set of used subspaces gradually increases, by 
activating one by one smaller and smaller dimensional subspaces, until, for T < n, all subspaces are 
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used with equal probabilitjfl Fig. [2] pictorially depicts this gradual inclusion of subspaces. 

This behavior is different from the result of |[T6l where all the subspaces up to dimension equal to the 
min-cut appeared in the optimal input distribution. This difference is due to the different channel model 
used in our work and in |[T6l . 



12 3 4 

T < n 



12 3 4 

n < T < n + minim, n] 



i t— m 



12 3 4 

n + min[m, n] < T 



Fig. 2. Probability mass function of the active subspace dimensions for channel parameters m — 4, n — 3. As it is shown in 
Theorem [3] there exist three different regimes. 



2) Values of m and n: For a given and fixed packet length T, the optimal value of m and n equals 
m = n = [T/2\ (optimality is in the sense of minimum requirement in order to obtain the maximum 
capacity for this T). For fixed T and m, the optimal value of n equals n = min[m, [r/2j]. For fixed T 
and n, the optimal value of m equals m = min[n, [T/2J]. 

TABLE I 

Information loss from using coding vectors when n = m. 





T < 2m 


T >2m 


Cm Rev 


0(1) 


(l) = (r-l)(T-r)^+0(g- 1 ) 



3 ) Subspace coding vs. coding vectors: One of the aims of this work was to find the regimes in which 
the using of coding vectors lfl2l is far from optimal. Table U summarizes this difference. As we see from 
the Table]!] subspace coding does not offer benefits as compared to the coding vectors approach for large 
field sizd_|. 

5 Note that although all the subspaces are equiprobable, we have distinct values for a* since there are different number of 
subspaces of each dimension. 

6 In the algebraic framework of |6|, the lifting construction used coding vectors, and they showed that this construction achieves 
almost the same rates as optimal algebraic subspace codes. However, we demonstrate in this paper that this phenomenon occurs 
for longer packet lengths using an information-theoretic framework. 
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Table U is calculated as follows. The achievable rate R cv using coding vectors equals 

R cv = P [rank(G fe ) = k]k(T - k) log 2 q, 

where < k < m is the number of packets in each generation, i.e., each packet includes a coding vector 
of length k and T — k information symbols. Equivalently, we assume that we use k of the m possible 
input packets. The matrix Gk is the k x k sub-matrix of G that is applied over the input packets. To 
calculate R cv , we know that P [rank(G fc ) = k] = IltoU ~ q~ k+i ) = I - q" 1 + 0{q~ 2 ). Assume we 
choose k = i* we have R cv = i*(T- i*) log 2 q - i*(T - i*)^ 1 , where i* = min [m, n, [T/2\]. For the 
capacity C m we use the large (/-regime as considered in Theorem [2] for the case T < 2m and the finite 
(/-regime of Theorem 0] for the case T > 2m. 

B. Extension to the packet erasure networks 

After the error free single source scenario, we consider packet erasure networks, and calculate an 
upper and lower bound on the capacity for this case. The work in 1161 . which is the closest to ours, 
did not consider erasures but instead constant-dimension additive errors. In practice, depending on the 
application, either of the models might be more suitable: for example, if network coding is deployed at 
an application layer, then, unless there exist malicious attackers, packet erasures are typically used to 
abstract both the underlying physical channel errors, as well as packet dropped at queues or lost due to 
expired timers. 

We model the erasures in the network as an end-to-end phenomenon which randomly erases packets 
according to some probability distribution. Formally, we rewrite the channel defined in (0]) as 

Y[t] = E[t]G[t]X[t], (13) 

where G G F™ xm is assumed to be a squre chanel matrix and E E ]F™ xm i s a diagonal random matrix 
whose elements on its diagonal are either 1 or 0. We also assume that q is large, and as a result the 
transfer matrix is full rank with high probability. Moreover, we consider the case where m < ^, i.e. the 
matrix X is a fat matrix. Recall that we can think of the rows of this matrix as packets send by the 
source, and the rows of the Y matrix as packets received at the destination. 

Note that in equation (fT3l all of the erasure events are captured by the erasure matrix E[t]. Moreover, 
the erasure pattern is important only up to determining the number of packets that the destination receives, 
since the transfer matrix G[t] is unknown and distributed uniformly at random over all full rank matrices. 
Thus, we model the number of received packets (number of non-zero elements on the diagonal of E[t\) 
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as a random variable iV which takes values in < N < m according to some distribution that depends 
on the packet erasures in the network. In this case the capacity is 

C e = max I(X;Y,N). 

Px 

We can then use our previous result, Theorem |2j to find an upper and lower bound for the capacity C e 
when we have packet erasure in the network, as the following Theorem [5] describes. 

Theorem 5: Let the number of received packets at the destination be a random variable N defined 
over the set of integers < N < m. Also, assume that m < T/2. Then for large q, we have the following 
upper and lower bound for the capacity C e , 

Hx(T - m) log 2 q < C e < fa ( T - — ) log 2 q, 

\ Ml / 

where m = E N [N] and // 2 = E N [N 2 ] . 

For the proof of Theorem [5] and more discussion refer to Appendix |B] 

Note that because we do not necessarily employ full-rank matrices X, it is possible that although some 
packets are erased at the destination, the received packets still span a matrix of the same rank as X; thus 
erasing packets is not equivalent to erasing dimensions. 

C. Multiple Sources 

In several practical applications, such as sensor networks, data sources are not necessarily co-located. 
We thus extend our work to the case where multiple not co-located sources transmit information to 
a common receiver. In particular, we consider the non-coherent MAC introduced in Definition [6l and 
characterize the capacity region of this network for the case of two sources with mi and 7712 input packets 
and packet length T > 2(mi + m-i). We believe that this technique can be extended to more than two 
sources. 

To find the rate region of the matrix multiple access channel Ch m _ M AC> we first show that the two 
channels Ch m _ M AC and Ch s _ M AC are equivalent, as stated in Theorem [6] We then find the rate region of 
the subspace multiple access channel Ch s _ MAC which is stated in Theorem |7] To avoid repetition, we state 
Theorem [6] without a proof because its proof is very similar to that of Theorem [T] 

Theorem 6: The matrix MAC Ch m _ M AC defined in Definition [6] is equivalent to the subspace MAC 
Ch s _ MAC defined in Definition [7] in the sense that the optimal rate region for these two channels is the 
same. 
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Theorem 7: For ^ > mi + m,2, the asymptotic (in the field size q) capacity region of the MAC 
Ch m _MAC introduced in Definition [6] is given by 

TV = convex hull TZ(d\,d 2 ), 

{d u d 2 )ev* 

where 

K(d 1 ,d 2 ) = {(Ri,R 2 ) ■ Ri < Ri(di,<h), i = 1,2}, (14) 
Ri(di,d 2 ) = di(T -di- d 2 ) log 2 q, i = 1, 2, 

and 

T>* = {(^1,^2) : < di < min[n, mi], < c?i + c?2 < min[n, m\ + m 2 ]}- 

We note that the rate region forms a polytopes that has the following number of corner points (see 
Corollary [Din © 

min [mi, (n - m 2 ) + ] + min [m 2 , (n - mi) + ] + 2 - l{„> mi+m2 }. 

The rate region 1Z* is shown in Fig. [3] for a particular choice of parameters. 

The proof of this theorem is provided in ^V] We first derive an outer bound by deriving two other 
bounds: a cooperative bound and a coloring bound. For the coloring bound, we utilize a combinatorial 
approach to bound the number of distinguishable symbol pairs that can be transmitted from the sources 
to the receiver. We then show that a simple scheme that uses coding vectors achieves the outer bound. 
We thus conclude that, for the case of two sources when > mi + m.2, use of coding vectors is 
(asymptotically) optimal. 

IV. The Channel Capacity: Single Source Scenario 
In this section we will prove Theorem |2j Theorem [3l and Theorem [4] 

A. Equivalence of the Matrix Channel Ch m and the Subspace Channel Ch s 
For convenience let us rewrite the channel (HJ) agairj^l 

Y = GX. 

7 ln the rest of the paper we will omit for convenience the time index t. 
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Fig. 3. The MAC region TV for parameters mi = 4, m-2 = 3, n = 3, T = 14, 



To find the capacity of the above channel we need to maximize the mutual information between the input 
and the output of the channel with respect to the input distribution Px(x). Since the rows of G are chosen 
independently of each other, assuming that a matrix X = x has been transmitted, we can think of the 
rows of the received matrix Y as chosen independently from each other, among all the possible vectors 
in the row span of x. The independence of rows of Y allows us to write the conditional probability of 
Y given X, referred to as the channel transition probability, as follows 

PY\x{y\x) = \ (15) 

otherwise, 



where x £ X = F™ xl , and y G y = ¥^ xl . 

The mutual information I(X;Y) between X and Y is a function of Px(x) and Py\x(v\ x ) tnat can 
be expressed as 

I{X-Y) = Px(x)P Ylx (y\x)log 2 ( Py ^ ) X) ) ■ (16) 

vty' 

It is clear from (031 ) that Py\x{v\ x i) = Py\x(u\ x 2) for all x±,x 2 € X such that (x\) = (X2) which 
reveals symmetry for the channel Ch m . We exploit this symmetry to show that C m = C s as it is stated 
in Theorem Q] and proved in Appendix lAl 
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The proof of Theorem Q] determines how we can map an input distribution of Ch s to an input distribution 
for Ch m that achieves the same mutual information. The input distribution Px{%) should be chosen such 
that we have Y^xeX-{x)=iT Px{x) = Pn K (^x)- One simple way to do this is to put all the probability 
mass of 7r x on one matrix x such that (x) = ir x . 

B. Upper and Lower bound for the Capacity of Ch m 

Here, we state the proof of Theorem [2] by giving upper and lower bounds for the capacity that differ 
in o(l) bits, which vanishes as q — > 00. 

Let C m (n,m,T) denote the capacity of the channel Ch m . Let Cf- m (n,m,T) denote the capacity of 
the channel Y = AX where A E F™ xm is a full-rank matrix chosen uniformly at random among all the 
full-rank matrices in F" xm . Then, we have the following lemma. 

Lemma 5: We can bound C m (n, m, T) from above and below as follows 

C m (h,h,T) < C m (n,m,T) < C f . m (n,m,T) < C f . m (h,h,T), 
where h = min[m, n]. 

Proof: Let U nxrn G FJ xm denote a generic random matrix chosen uniformly at random and 
independently from any other variable. Similarly, let A nxm G ]p™ xm denote a generic full-rank matrix 
chosen uniformly at random among all such full-rank matrices and independent from any other variable. 
(Note that each new instance of such a matrix in the same equation denotes a different random variable 
which is independent from the other random variables.) 

Since the channel Y = A nxm X is statistically equivalent to the channel Y = A nxn A nxm A mxm 
we have, by the data processing inequality, that Cf. m (n,m,T) < Cf. m (h,h,T). 

Using the same argument, since the channel Y = U nxm X is equivalent to the channel Y = U nxn A nxm X 
if n > m, and is equivalent to the channel Y = A nxm U mxm X if n < m we have C m (n,m,T) < 
C f . m (n,m,T). 

To obtain the lower bound we proceed as follows. Let us choose X = ^ 
where Y = U nxm X. Then we can write 

X = UhxhX, 

where Uh X h is the upper left h x h sub-matirx of U nxm . Thus, again the data processing inequality 
implies that C m (h, h, T) < C m (n, m, T). ■ 
Lemma 6: For C m (n, m, T) we have 

C m (n, m, T) < i*{T - i*) log 2 q + o(l), 



X and Y = [I h 0] Y, 



Y=[I h 0}U n> 



h 
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where i* = min[m,n, [T/2J]. 

Proof: By Lemma [5] we have 



C m (n, m, T) < C f . m (h, h, T) 



(a) 



(b) 



iog 2 



,i=0 J 



i*(T-i*)log 2 q + o(l), 

where (a) follows from lfl6l Corollary 2] and (6) follows from Lemma [T] 
Lemma 7: For C m (n,m,T) we have 



C m (n,m,T) > i*(T-i*)lo£ 



o(l), 



where i* = min[m, n, [T/2\]. 



Proof: For every subspace II G Gr(T, i*), let RREF(II) € x be a matrix in reduced row 
echelon form such that II = (RREF(II)). Choose X = x RREF(n x ) G F™ xT , where n x is 

chosen uniformly at random from Gr(T, i*). Define the random variable Q = l{ ra nk(r )=«*}• Note that 
n y = n x when Q = 1. Thus, we have iT(IIy|ILx,Q = 1) = and iT(IIy|Q = 1) = H(U X ) = 
log 2 [£] > «*(T - i*) log 2 g. Then, it follows that 

(a) 

C m (n,m,T) > C m (h,h,T) 

(b) 

> /(n x; n y ) 

^ I(IL X ;U Y ,Q) 

= I(U X ;Q) + I(U X ;U Y \Q) 

>P[Q = 1]I(U X ;U Y \Q = 1) 

>P[Q=l]i*(T-i*)log 2 q, 

where (a) is due to Lemma[5J (6) follows follows from TheoremQ] and (c) holds since Q is a deterministic 
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function of Uy- Now, note that we can write 

F[Q = l)=F[v a nk(U hxh X) = i 
= F [rank (u hxh 

= P[rank(£W) = i* 

i* 

> 1 



>i-'-, 
q 

and thus we obtain the desired result. ■ 
Combining Lemma [6] and Lemma |7] recovers Theorem [2] 

C. The Optimal Solution: General Approach 

Generally, we are interested in finding the capacity and input distribution of Ch m exactly. It is shown 
in Theorem Q] that instead of the channel Ch m we can focus on the channel Ch s . Thus, we are interested 
in optimizing the following quantity 

I(U X ;U Y ) = Pn*(7r.)P nH n x (%kz)log 2 (^^y^) • (17) 

Remember that X = Sp(T, m) and y = Sp(T, n). 

The following lemma states that the optimal solution for the channel Ch s should be uniform over all 
subspaces with the same dimension, as it is intuitively expected from the symmetry of the channel. 

Lemma 8: The input distribution that maximizes I (Tlx', Ily) for Ch s is the one which is uniform over 
all subspaces having the same dimension. 

Lemma [8] shows that the optimal input distribution can be expressed as 

F[U X = 7T X } = ^-, (18) 
id J 

where d x = dim(7r x ), a^ x = F [dim(IIx) = d x \, and we have Y^ ir ^o' T ^ a d x = 1- We can then simplify 
I(Hx;T1y) as stated in the following lemma. 
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Lemma 9: Assuming an optimal input probability distribution of the form in (1181 ). the mutual infor 
mation /(rix;ny) can be simplified to 

min [m,T] 

J(nx;IIy)=- ^2 ad :c nd x \og 2 q 
rfx=o 

min[m,T] min[n,d x ] 



rfx=0 



where 



f(dy 



i:(n,d y ) [J] 



dy=0 



min[m,T] 



\Og 2 {f{dy)), 



(19) 



—nd~ 
Q ad* 



(20) 



Lemmas [8] and [9] show that the problem of finding the optimal input distribution for the channel 
Ch s is reduced to finding the optimal choice for ccj, i = 0, . . . , min[m, T]. We know that the mutual 
information is a concave function with respect to Pu x (^x)'&- Observation Q] implies that because (fT8l) is 
a linear transformation from Pq x (ti"x)'s to a^'s, as a result the mutual information /(Ilx;ny) is also 
concave with respect to ctj's lfl"8l . 

Observation 1: Let p(x) be a concave function and let x = h(z) be a linear transform from z to x. 
Then g(h(z)) is also a concave function. 

Using Observation [Q we know that the mutual information is a concave function with respect to a^s. 
This allows us to use the Kuhn-Tucker theorem ifTHTl to solve the convex optimization problem. According 
to this theorem, the set of probabilities a*, < i < min [m,T], maximize the mutual information if and 
only if there exists some constant A such that 

= A VJfe : a* k > 0, 

(21) 

< A VA: : a* k = 0, 

where EtT^ a* = 1, < k < min[m,T], and a* is the vector of the optimum input probabilities 
of choosing subspaces of certain dimension, 

r i T 

at = a% ■ ■ ■ a* ■ , rp, 

U mm[m,T\ 

Lemma 10: By taking the partial derivative of the mutual information given in ([191 with respect to 
ctfc, we have 



' di(u x -,n Y ) 

da k 



dI(U x ;U Y ) 

dak 



min[n,fc] 
d v =0 



k 
L<*l/J 



^" fc log 2 (/(^))-log 2 e. (22) 
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Multiplying both sides of (1221) by a& and summing over k we get 

min[m,T] 

7-log 2 e= ^2 a kl' k - 

k=0 

By choosing the optimal values = a* k for < k < min[m,T], the RHS becomes A, and the mutual 
information increases to C s . So we may write A = C s — log 2 e. 

D. Solution for Large Field Size 

In this subsection, we focus on large size fields, q 3> 1. This assumption allows us to use some 
approximations to simplify the conditions in (|2"TT) . Assuming large q we can rewrite (1221) as follows 



min[n,fc] 

4 = -nfclog 2 g-log 2 e- £ (l + Ofo" 1 )) ,-(»-«*.)(*-««.) l g 2 (/(d,)) , (23) 

cL=0 



where we have used Lemma [TJ and Lemma [3] Using similar approximations, log 2 f(d y ) defined in (l20l) 
can be approximated as 

(min[m,r] \ 
£ ?~ (n ~ d «' )d -a«*. • (24) 

Then we have the following result, Lemma ITT1 

Lemma 11: The dominating term in the summation in (1231 is the one obtained for d y = mm[n, k]. 

From the proof of Lemma [TT] written in Appendix El we can also see that the remaining terms in the 
summation of (1231) are of order o(l), so we can write 

(min [m,T] \ 
£ ? -[«-^*]]". ad . • (25) 
d x =min[n,fc] / 

Assuming that the expression inside the log(-) function in d25l) is not zero for every < k < min[m, T], 
we can rewrite the Kuhn-Tucker conditions as 

min[m,T] 

E — [n— min[n,fc]](i x n— C s +o(l) [Tmin[n,fc]— nk] 

q d. z y i 

d x =min[ri,fc] 

where the inequality holds with equality for all k with a£ > 0. 

Let 6 = min[m,T] and define the (6 + 1) x (<5 + 1) matrix A with elements 

a [«-min[n,i]]i min [ n ^] < j < £ 

otherwise. 
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We also define the column vector b with elements = q[ TTam [ n A- m ] for < i < 5. Note that for 
convenience the indices of matrix A and vector b start from 0. Using these definitions, we are able to 
rewrite the Kuhn-Tucker conditions in the matrix form as 



Ac** y 2~ a+0 ^b. 



(26) 



In the following, we consider two cases for 5 < n and 5 > n, and find a* for each of them, separately. 
First case: 5 < n. In this case we can explicitly write the matrix A and vector b as 

1 Q~ n • • • q~(S~^) n q—5n 
••• q-(S-l)(n-l) q S(n-l) 

■■• ^-(<5-l)(n-2) q -S(n-2) 








and 



••• q 



1 « (T - n) 



-(5-l)(n-5+l) -<5(n-<5+l) 



5(T-n) 



The fact that the expression inside the log 2 (-) function in (l25l) is non-zero for k = 5, forces a| to be 
positive. Thus the last row of the matrix inequality in d26l) should be satisfied as an equality. Therefore, 



„S(T-n) 

2 -C s +o(l) _ q S(T-5) 2 -C s +o(l) 



-5(n-8) 

Now we use induction to show that the optimal solution has the form 

qi(T-i) 2 -C s +o(l) . K <i<S, 



0: ; 







< i < k, 



(27) 



where we will determine k later. 

Let us fix / and assume that a* = qr 4 ( T -*)2~ c ' s+0 ( 1 ) for < / < i < 5. Then for a* we can write 

Ana* + g- (n - 0i a*>g' (T - n) 2- a+o(1) , 



or equivalently 



j=l+i 



Ana* >qKT~n) 2 -a + o(l) _ q ^ 



(ra-Oj'n/* 



j=l+l 



J(T-n) 2 -C s +o(l) 



1 _ ^ g (T-™-j)0-0 



3=1+1 



(28) 
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We can use induction for one step more to show that aj is of the desired form (|27T ) if the previous 



expression is satisfied with equality. This is true if we have 1 — Q 



(T-n-j)(j-l) 



> 0, or equivalently 



(assuming large q) if we have (T — n 



=l+i < 0- ^° we can conclude that we should have (T— n) + < 



I < 8. It can be easily verified that for i < (T — n) + the Kuhn-Tucker equation for a* satisfies the strict 
inequality so a* = for i < min[(T — 6]. The above argument results in a solution of the following 
form for the case 5 < n 

' q i(T-i) 2 -C s +o(i) . min ^ T _ < j < tf, 

: < i < min[(T- n) + ,5] . 

Second case: 5 > n. We now write matrix A and vector b as 



(29) 



and 





1 


q -n 


q- Sn 







q-(n-l) . . . 


q -S(n-i) 


A = 





q 


-(n-l) q-n . . . q-S 










1 ••• 1 










1 ••• 1 


g (T-n 


) . 


. q (n-l)(T-n) 


q n(T-n) q n(T-n-l) ... Q 



n(T-6) 



The last 5 — n + 1 rows of A are the same while hi is decreasing with i for i > n. Thus, the last 5 — n 
inequalities are strict and therefore, 



a n+1 = ■ ■ ■ = a s = 0. 
The remaining equations can simply be reduced to the first case. Define 



(30) 



1 q~ n 
q-^- 1 ) 














-(n-l)n q -n 2 
q-{n-\)(n-l) q-n(n-l) 
g -(n-l)(n-2) q -n(n-2) 



-(n-l) 



and 



1 o( T -) 



n(T-n) 
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The remaining conditions in this case can be written as 

Ac** y 2- Cs+ °«b, 



which is exactly similar to (1261) . for 5 = n. Therefore, the optimal solution for the first case will also 
satisfy these conditions, i.e., 

' q i(T-i) 2 -C s +o(i) K<i<n 

(3D 

< i < k, 

with k = min[(T — n) + ,n]. Summarizing (l30l ) and (l3TT l. we can obtain the optimal solution for this 
regime, as 

n < i < 5, 

q i(T-i) 2 -Cs+o{l) K <i< n , (32) 
< i < k, 

where k = min[(T — n) + ,n]. This completes the proof of Theorem [3] By normalizing a* to 1 we can 
also obtain an alternative proof to Theorem |2] 

Discussion: To characterize the exact value of qo one have to consider the exact form of the set of 
equations given in (128T ) (for each I) which are as follows, 

<5 

Ana* > q KT-n) 2 -C s +e q (l) 1 _ ^ q (T-n-j)(j-l) 2 [e q (j)-e q (l)} 

j=l+l 

Although it is hard to find qo exactly, it is possible to show that there exists finite go such that result of 
Theorem [3] holds for. This can be done by solving above equations assuming that e q {k) is zero for every 
k (assuming q 3> 1). Then, it can be observed that the RHS of (|28T ) are either greater or less than zero. 
Now by assuming finite but large enough q and considering the exact form of (|28T ) we have some small 
perturbations that cannot change the sign of RHS of d28l ) so we are done. 



E. Proof of Theorem |4] 

Let e q (k) denotes the error term in d25l ). We can easily write the exact expression for e q (k) which is 



as follows 



rk 



d v =0 



(min [m,T] 
: 



k' 



g- nfc iog 2 




log 2 q rk{d *- rk) - nd *a, 
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where r k = min[n, k]. 

We consider the case where T > n + min[m,n] so Theorem [3] implies that for the optimal input 
distribution we have c^. = 1 where i* = mm[m,n] and q > q . Then we can simplify e q (k) more and 
write 



cL=0 



IdJ 



(33) 



where we also use Lemma |4] in the above simplification. 

To find qo, the minimum value of q that the result of Theorem [4] is valid for, we should consider the 
exact form of d28l ) and check that the RHS of d28l ) is less than or equal to zero for < I < (i* — 1). So 
from d28l ) for every < I < (i* — 1) we may write 

\ _ (T-n-i')(i*~l) 2 [e q (i')~e q (l)} 



or equivalently 



e 9o (0 - e <?o(**) 



(T — n — — I) 

Using a similar argument we should have also 

^(0-^(1*) 



<0, 



<log 2 (7o, VZ:0<Z< 



< log 2 go, V/ : (i* + 1) < Z < m. 



(34) 



(35) 



i*(Z - i*) 

From ([32]> for the capacity C s we have C s = i*(T — i*) log 2 Q + e 9 (i*)- Evaluating (|33]» at Zc = i* we 
have 



<2„=0 



<r m io g2 



i*(T - z*)log 2 <?, 



which results in the capacity stated in the assertion of Theorem |4] 

Discussion: We derive a sufficient condition on the minimum size of q to satisfy the set of conditions 
stated in ( f34b and d35l ). Using this sufficient condition we explore the behavior of q as T increases. 
For k ^ i* we can write 

(a) r ' k 

e q (k) < 4 £ q -(n-d y )(k-d y ) lQg2 (y.cr-i^ _ rfc (T _ e) lQg2 ? 



-(max[n,fc]— min[n,fc]+l) 



< 8 + 4r k q 
(*>) 

< (8 + 8r fe )+ (4r fc (r fc -l)(T-i*)- 



2 + (r fc -l)(T-r)log 2 



log 2 <? 



g(max[n,fc]— min[n,fc]+l) / ' 

where (a) follows from ([2]) and (O, and in (b) we use the fact that k ^ i* . 



(36) 
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Then for k = i* we can write 

~T 

i* 



e q (e)>^n,i*)q- ni ' log 2 



i*(T-i*)log 2 q 



>-(n 2 (r-n^r, (37) 

where (a) follows from Q and ([3]). 

Let us consider two cases. First, we assume that m < n so i* = m. To find a sufficient condition for 
qo we have to only consider conditions given in (l34l ). Using (l36l ) and (137T ) and assuming that T — > oo 
we should have log 2 qo > 5m 2 q ~ n+m ~ 1 log 2 <?o> or equivalently (?Q _m+1 > 5(i*) 2 . 

For the second case we have m > n which means i* = n. Here, using a similar argument to the 
one given above for the first case we can show that conditions (l34l) give some constant qo as T — > oo. 
However, the conditions 051 ) give a sufficient condition for qo which grows as T — > oo. Now, using 
(l35l ). (l36l ). and (l37l ) and assuming that T — )• oo, a sufficient condition for qo would be log 2 qo > 
AnTq^ 2 log 2 go + n Tq^ 1 log 2 go- F° r large for the sufficient condition we have qo > i*T. 

V. Multiple Sources Scenario: The Rate Region 

The goal of this section is to characterize 1Z, the set of all achievable rate pairs (R±, _R 2 ) for two user 
communication over the multiple access channel C m .MAC described in Definition [6l More precisely, we 
will show that 1Z = TZ*. In order to do this, we first formulate a mathematical model for this channel. 
Then, we present an achievability scheme, to show that 1Z* is achievable, i.e., TZ* C 1Z. In the next 
subsection we prove the optimality of this scheme and show that 1Z C 7Z*. 

The proof of the converse part of the theorem is based on two outer bounds, namely, a cooperative 
bound and a coloring bound. For the coloring bound, we utilize a combinatorial argument to bound the 
number of distinguishable symbol pairs that can be transmitted from the two sources to the destination. 
This bound allows us to restrict the effective input alphabets of the sources to subsets of the original 
alphabets, with significantly smaller size. We can then easily bound the capacity region of the network 
using the restricted input alphabet. 

The transition probability of the channel given by Definition |6j Py\x ± x 2 ' can be written as 

j g-ndimKsO+te)) {y) □ {xi) + {X2) ; 

p Y\x 1 x 2 (y\xi,x 2 ) = < (38) 

otherwise. 

Our first result, stated in Theorem |6j is that the multiple access matrix channel described in Definition [6] 

is equivalent to the "subspace" channel Ch s _ M AC described in Definition 13 that has subspaces as inputs 
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and outputs. So to characterize the optimal rate region of Ch m _ M AC> we can focus on finding the optimal 
rate region of Ch s _ M AC- We will use this equivalence in the rest of this section. 

We know from |[T5l that the rate region of the multiple access channel Ch. s _ MAC is given by the closure 
of the convex hull of the rate vectors satisfying 

R s < I(U Xs ; U Y \U Xsc ) for all S C {1, . . . , N s }, 

for some product distribution Pu Xl (tti) ■■■ Pn XN (^N,)- Note that Rs = SieS-^*' wriere Ri is the 
transmission rate of the ith source, Ilx s = {flj^ : i G 5} and S° is the complement set of S. 



A. Achievability Scheme 

In this subsection we illustrate a simple achievability scheme for the corner points of the rate region 
defined in Theorem [7] The remaining points in the rate region can be achieved using time-sharing. 
For given (di,d 2 ) G T>*, define the following subspace code-books 



and 



C 2 ^ 



<*i> : X 1 



(X 2 ) ■ x 2 



Irfi xdi 




Ui 


0(mi— di)xdi 


Q(mi—di)xd 2 


®(m 1 -d 1 )x{T-d 1 -d 2 ) 


0d 2 xd! 




u 2 


Q(m 2 -d 2 )xd 1 


®(m 2 -d 2 )xd 2 


°(m 2 -d 2 )x(T-d 1 -d 2 ) 



,Ui G F^ lX ( T " dl - d2 )| 
,U 2 G F^ 2X ( T - dl - d2 )|. 



If we transmit messages from these code-books, we have 

Y = HxXx + H 2 X 2 



Hi 



Ho 



fTxUi + H 2 U 2 



where Hi captures the first di columns of H{. Therefore, decoding at the receiver would be just recovering 
of Ui and U2 given -f/iUi + H 2 XJ 2 , H\, and H 2 . Since d\ + d 2 < n, the matrix [Hi H 2 ] is full-rank 
with high probability, and therefore the decoder is able to decode Ui and U2. 

Note that the achievability scheme uses effectively the coding vectors approach |[T2l . This indicates 
that for ^ > max[mi + m 2 , n] and q large enough, the subspace coding and the coding vectors approach 
achieve the same rate. 
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B. Outer bound on the Admissible Rate Region 

In the following we will present an outer bound for TZ, the admissible rate region of the non-coherent 
two-user multiple access channel Ch m _ M AC- Recall that by Theorem [6] we can focus on the subspace 
channel Ch s _ M AC- We first show in Proposition Q] that TZ C lZ coop , a cooperative outer-bound. Then 
Proposition |2] demonstrates that TZ C lZ co \, a coloring outer-bound. Finally we show that 7£ C oi H 7£ COO p 
TZ, yielding the desired outer-bound 1Z C TZ* which matches the achievability of W-Ai 

The first outer bound, called cooperating outer bound, is simply obtained by letting the two transmitters 
cooperate to transmit their messages to the receiver, i.e. we assume they form a super-source. Applying 
Theorem [2] for the non-coherent scenario for the single super-source, the one who controls the packets 
of both transmitters, we have the following proposition. 

Proposition 1: Let % > mi + m,2- We have TZ C 7Z coop where 

ftcoop = {(Ri,R*) ■ Ri + R2< k(T - k) lo g2 q} , 

and k = minfmi + m2,n]. 

The rest of this section is dedicated to deriving the second outer bound which is denoted by TZ co \. This 
bound is based on an argument on the number of messages per channel use that each user can reliably 
communicate over the multiple access channel. 

Let (Ri, R2) £ TZ be an achievable rate pair for which there exists an encoding and decoding scheme 
with block length N and small error probability. One can follow the usual converse proof of the multiple 
access channel from |[T5l to show that 

1 N 

r x < i(n£ i; n£|n£ 2 ) < - ^i(u Xlt ;u Yt \u x . 2t ), 

t=i 

i N 

r 2 < /(nf^-nfinfj <-^i(u X2t ;U Yt \u Xlt ), 

t=i 

1 - 

fli + R 2 < /(n^ , n^ 2 ; n£ ) < - J] i(^x lt , n w ;n ri ). 

t=i 

For each time instance t, denote by C^t, the projection of the code-book used by user i to its i-fh element. 
For a single source scenario, we have shown in ^TVl that we can use the set Sp(T, m) as our input alphabet 
for all time slots, and have the receiver successfully decode the sent messages, and hence, the user can 
communicate S(T,m) distinct messages. For the multi-source case, C^t is more restricted. The main 
reason for this is that the transition probability of the multiple access channel -Pn^|n Xl n X2 i s of the form 
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Pn Y \TL Xl +iL x • That is, if (7ri,7r 2 ) G X\ x X 2 and (tti,7T 2 ) e X\ x X 2 satisfy tti + tt 2 = tt[ + ir' 2 , then 
P(IIy 1 7ri , 7r 2 ) = -P(IIy \tt[, 7r 2 ), and hence the receiver cannot distinguish between the two pairs. 

In the following we will discuss this indistinguishability in detail, and derive the maximum number of 
distinguishable pairs which can be conveyed through the channel. In order to do so, we start with some 
useful definitions and lemmas. 

Definition 8: For a fixed tt\ G Gr(T, d\), we denote by M{tt\, d 2 , d\ 2 ) the set of subspaces of 
dimension d 2 that intersect with tt\ at d\ 2 dimensions, i.e., 

Af(7Ti, d 2 , d 12 ) ± {tt 2 G Gr(T, d 2 ) : dim(vr 1 n vr 2 ) = d 12 }. (39) 

It turns out that the cardinality of the set AA(7i"i, d 2 , di 2 ) depends on tt\ only through its dimension, 
d\ = dim(7Ti). Therefore, we denote this number by n(d\, d 2 , d± 2 ), which is characterized in the following 
lemma. 

Lemma 12: The cardinality of the set iV(7Ti, d 2 , di 2 ) is given by 

n(d 1 ,d 2 ,d 12 ) = \N(7r 1 ,d 2 ,d 12 )\ = q ^{d 1 -d 12 )+{d,-d 1 ,)(T-d 2 ) _ m 

Definition 9: For a fixed 7Ti G Gr(T, d\) and 7T2 G Gr(T, d 2 ), we define 

Aim, tt 2 ) 4 {tt 2 G Gr(T, da) : tti + tt' 2 = tti + vr 2 }. (41) 

Lemma 13: The cardinality of the set A(tti, tt 2 ) only depends on the dimensions of the two subspaces 
and their intersection, di = dim(-7ri), d2 = dim^), and di2 = dim(-7ri n ir 2 ). Moreover, it can be 
asymptotically characterized by 

a(dt, d 2 , d 12 ) = |A(7ri,7r 2 )| = q d ^~ d ^. (42) 

Definition 10: For an arbitrary set C C Sp(T, m), we denote the projection of C onto the set of 
d-dimensional Grassmannian C(d). Formally, 

C(d) 4 C n Gr(T, d) = {vr G C : dim(vr) = d}. 

For a fixed time instance i, and corresponding subsets C\ t t and C 2) t, we can construct a table with 
|Ci ; t| rows and |C 2j j| columns, each row (column) corresponding to one subspace -k\ (it 2 ) in C± : t (C 2: t)- 
In the following, we define an equivalence relation for the cells of this table. 

Definition 11: A coloring for a table constructed as above is an assignment of colors to the cells 
of the table using a function col : C\ t x C 2]t — > N such that col(7ri,7r 2 ) = col (7^, 7r 2 ) if and only if 

7T\ + 1T 2 = Tr[ + 7T 2 . 
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It is clear that the coloring definition above exactly matches with that of indistinguishability we discussed 
before. More precisely, two pairs of subspaces (7Ti,7T2) and (tt'i,tt 2 ) are distinguishable if and only if 
their corresponding cells in the table have different colors. The following theorem upper bounds the 
cardinality of the subspace sets based on this fact. 

Theorem 8: For each pair of uniquely distinguishable sets (Ci,t,C 2 ,t) defined on the input alphabet 
X\ x X 2 for the multiple access channel Ch s _ M AC> there exist integer numbers < 5i(t) < rrii such that 

\Ci, t \<^ T - s ^-^\ i = l,2. (43) 

Proof: We may drop the time index t in this proof for brevity. For a fixed t, let 8{ be the dominating 
dimension in the set Cj, i.e., 

5i = argmax|Ci(d)|, 

d 

where C{ (d) is as defined in Definition [TOj It is clear that 

\Ci\ = &( d )\ < mi\Ci(6i)\ = \Ci(5i)\, (44) 

d 

where the last asymptotic equality follows from the fact that m\ is a constant with respect to the underlying 
field size q. This means that we may lose only a constant factor in the code-book size by removing all 
subspaces from C\ (C2), except the ones that have dimension <5i (82) ■ Therefore the loss in the rate values 
would be negligible as q grows. Consider the table constructed for C\(8\) and 62(82) ■ Let tti G Ci(8±) 
be a Si -dimensional subspace, and consider the corresponding row of the table. We further partition the 
columns of the table with respect to tt\ into U^^g 1 ^C 71 " 1 ' 82, d\ 2 ), where 

C 2 (tti, S 2 , d 12 ) = {tt 2 £ C 2 (S 2 ) : dim^i n ir 2 ) = d 12 }. (45) 

We use K(tt\, 62) and K(tt\, 82, di 2 ) to denote the number of different colors in the row that corresponds 
to 7ri and its intersection with C2(tti, 82, dw), respectively. 

Note that C2 {tti , 82 , d±2 ) C A/"(7Ti, 82, di 2 ), and therefore the number of different colors that ap- 
pear in this partition of the row, cannot exceed the number of colors that could potentially appear if 
■A/"(7Ti, 82, du) C C2. Recall that Af(m, 8 2 , ^12) has n(8i, 82, di 2 ) elements, which are split into subsets 
of size a(8i,8 2 ,di 2 ) of the same color. Therefore, for a large field size, the number of different colors 
in this partition of the row corresponding to tt\, can be upper bounded as 

Kfa^du) < = q^ d ^ T - 5 ^ +d -\ (46) 

a{di,d2,di2) 
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Hence, 

min[<5i,<5 2 ] 

K(7r 1 ,5 2 )= Yl K(^u^d 12 ) 

d 12 =0 
min[<5i ,<5 2 ] 

< g (<5 2 -di2)(T-5 1 -<5 2 + ( i 12 ) 
di 2 =0 

_j_ ^max < dl2 < min[5li52 ](5 2 -(ii 2 )(T-5i-(5 2 +di 2 ) 

= q s ^( T ~ Sl ~ s ^) (47) 

where the asymptotic inequality and equality hold for large q. Moreover, the last equality is based on 
the assumption T > 2{m\ + mi) > 2(S\ + 62) and the fact that the exponent is a decreasing function of 
d\2 for < d\2 < min[5i, 62]. 

It is worth mentioning that this argument holds for each choice of tt\ £ Ci{8\). This means if the first 
user transmits a 5\ -dimensional subspace, the receiver cannot distinguish more that gMT-fc-fe) different 
symbols. The same argument holds for a fixed column tt 2 € C2 which yields an upper bound to the 
number of distinguishable messages as q s ^( T - s ^- s ^) _ H 

Theorem [8] essentially upper bounds the single letter mutual information I(nx 1 u^Yt\^x 2 t) for any 
time instance t. The following proposition summarizes this discussion. 

Proposition 2: We have 1Z C lZ co \ where 

^•col — convex hull !Z(di,d2), 

(d u d 2 )€V col 

in which lZ(d\,d2) is as defined in (fT4b . and 

^coi - {(di,d 2 ) : < di < mi}. 

Proof: Using Theorem [8j we can upper bound the number of distinguishable pairs for each time 
instance. For a fixed t, let 5i(t) and S^t) denote the dominating dimensions. Therefore, we have 

1 - 

t=i 

N 



- V log 2 g[*i(*)( T -*i (*)-*"(*))] 
1 " 

-X;<5i(<)(r-Ji(t)-<fe(<))log 2 g, 



t=i 
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where < Si(t) < mi for t = 1, . . . , N, and i = 1, 2. Similarly, we have 

1 - 

^2< ^^5 2 (t)(T-<5 1 (t)-5 2 (t))log 2 g. 

Therefore, 

1 - 

(i?l,i? 2 ) < ^ ( J l(*)( T " " J 2 (*)) log 2 9, * 2 (t)(T - h{t) - hit)) log 2 q) . (48) 
t=l 

It is clear that the RHS of (|48"1) is a convex linear combination of the points 

{A(i)(T - ^(t) - 5 2 (i)) log 2 g, 5i(i)(T - - hit)) log 2 g}^ 

which are in the region TZ(Si(t), 62$)). This completes the proof. ■ 
Summarizing Proposition [T] and Proposition |2j we have 1Z C 1Z coop n 7£ co i- So, it only remains to 

prove the following theorem in order to show that 7£* is an outer bound for the admissible rate region. 
Theorem 9: We have 1Z COO p H TZ co \ C 7£*. 

Before presenting the proof of the theorem, we give the following two lemmas, which help us to 

characterize the corner points of the region of our interest. 

Lemma 14: The set of corner points of lZ co \ is the set of all rate pairs of the form 

{Ri,R 2 ) = (i?i(di,d 2 ), R 2 (d u d 2 )), 

for some (d\,d 2 ) G V, where 

V = {(0,m 2 ), (l,m 2 ), . . . , (m 1 ,m 2 ), (roi,m 2 - 1), ... , (mi, 1), (m 1( 0)}. 

Lemma 15: If 7£ co i ^ 7^ COO p, then any intersecting point of R\ + R 2 = k(T — k) log 2 q with the 
boundary of TZ C o\ is a point of the form d 2 ), R 2 (d±, d 2 )), where 

(di, cfe) € D U {(mi - 1, 0), . . . , (0, 0), (0, 1), . . . , (0, m 2 - 1)}. 

That is, the boundaries of TZ co i and 7£ COO p can only intersect on either the corner points of 1Z co i or the 
R\ — R 2 axes. 

Proof of Theorem |9} Note that 7£ COO p H 7£ C oi is a convex polytope, formed as intersection of a 
polytope and the convex hull of a finite number of polytopes. Therefore, it suffices to prove the theorem 
only for its corner points. Let {R\,R 2 ) £ 7£ coop n Ti co \ be a corner point. It is clear that one of the 
followings occurs. 

(i) (R\,R 2 ) is a corner point of 1Z co i and interior point of 7^ coop ; 

(ii) (i?i, R 2 ) is an intersecting point of the boundaries of lZ co \ and 7£ coop . 
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In the former case, Lemma [14] which characterizes the set of corner points of TZ C oi, implies there exists 
a pair (rfi,^) £ such that (R\,R2) = (Ri(di, cfo), R2(di, ab)). Also (i?i,i? 2 ) £ 7£ C oop implies 

(di + d 2 )(T - (di + d 2 )) log 2 q = Ri+R2<k(T- k) log 2 g. 

Note that the function f(x) = x(T — x) is an increasing function of x for x £ (0,T/2). Therefore, 
c?i + c?2 < k = min{mi + m2,n}, and hence (c?i, d 2 ) £ 2?*, which implies that (i?i, _R 2 ) £ 7£*. 

In the latter case, it follows from Lemma IT31 that (.Ri,-R 2 ) should be either a corner point of 1Z co i 
for which the above argument holds, or of the form (i?i,i? 2 ) = (i?i(di, cfo), Ri{d\, d 2 )) with <ii<i 2 = 0. 
Again {R\,R2) £ 7£ coop , which implies that eZi + d 2 < A; = min{mi, m 2 , n}, and (i?i,i?2) £ This 
completes the proof. ■ 

Corollary 1: The number of corner points of the rate region TZ* excluding the point (0, 0) is equal to 

min [mi, (n - m 2 ) + ] + min [m 2 , (rt - mi) + ] + 2 - l{„> mi+m , 2 }. 

Proof: By Lemma [141 the set of corner points of region lZ co \ correspond to the pairs (d\, d^) which 
belong to the set {(0, 7712). ..(mi, 7712). ..(mi, 0)}. In this case the number of corner points excluding 

(Rl,R 2 ) = (0, 0) is 7771 + 7772 + 1. 

However the final rate region is the intersection of TZ co i and 7£ COO p> where the later one includes all 
the rate pairs with sum smaller than k(T — k) log 2 q, k = min[mi + 777,2, n], see Proposition Q] 

Lemma [15] explains how these two regions intersect with each other. In this case, the corner points 
correspond to the pairs (di,^) which belong to the set {(0, 7712), . .., (a, 7772), (mi, /?),..., (mi, 0)} 
where a = min [mi, (77 — m2) + ] and j3 = min[m2, (n — mi) + ]. So the number of corner points excluding 
(0,0) is 

a + /3 + 2 - l{ n > mi+m2 }, 

where l{ n > mi + m2 ) takes into account the case where two points (a, m 2 ) and (mi,/3) overlap with each 
other. ■ 

VI. Conclusions 

In this paper, we used a random matrix channel to model the problem of multicasting over a packet 
network that employs randomized network coding. We calculated the capacity of this channel for the case 
where the finite field of operation ¥ q is large, but showed through simulation results fast convergence 
for small values of q. We prove that use of subspace coding, proposed for algebraic coding in 
171 , is optimal for this channel. Moreover, we showed that the capacity achieving distribution for very 



November 17, 2010 



DRAFT 



34 



small packet lengths uses subspaces of all dimensions, while as the packet length increases, the number 
of required dimensions in the optimal distribution decreases. In particular, the choice of the subspace 
dimension used in the seminal work of Koetter and Kschischang (6| is indeed optimal for large enough 
packet size. We extended our work to the case of multiple access with two sources, where we used a 
coloring argument to derive an outer bound for the capacity that we believe is interesting in itself. We 
showed that in all the cases we examined, the throughput benefits subspace coding offers as compared 
to the use of coding vectors go to zero as the alphabet size q increases, and thus use of coding vectors 
is (asymptotically) optimal. 
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result is used in the proof of Theorem [3] which gives a non-asymptotic characterization. 



Proof of Theorem\l} To prove the theorem, we start with I(X;Y) for the channel Ch m , stated in 
( fT6l ), where the channel transition probability is given in ( fT5l ). We will show that for each input distribution 
Px(x) there exists an input distribution P\i x {^x) f° r the channel Ch s such that I(X;Y) = Kjly'^x) 
and vice versa. 
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Appendix A 



Proofs 



We know that Py\x{y\ x ) = Py\x{v\ x ' 



') if (x) = (x'}. So we can write 




where we choose Pu x {tt x ) 



£*€*:<*>=*•. p x{x) and define 




otherwise. 



Then expanding I(X; Y) we have 



TtxGX it. 




( 



Py\u x (vWx) 

Mv) 



) 
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Now using the symmetry properties of iV|n. x -(!S/l 7r a:) we can simplify I(X\Y). In fact Py\n. x (vil^x) = 
Py | n x (?/2 1 Kx) and Py(ui) = Py{y2) if (yi) = (2/2)- So we can remove the summation over y and write 

'PY\u x {y\^xY 



I(X;Y) = PnA^x) ^(T,n,7T y )P YlUx (y\7T x )log 2 ^ 



Py(y) 



c&x TT y ey 

for some matrix y such that (y) = ir y . Remember that ip(T, n, ir y ) is defined in Definition |3l $311 Defining 

Pn Y \u x (^vWx) = ip(T,n,Tr y ) PY\u x (y\^x)\ forsomey . {y)=Wy , we can write 

i(x-Y) = PnA^)Pu Y \uM^o g2 p w (7r ^ = /(nx; n y ). 

Based on the above discussion going back from the channel Ch s to Ch m is very easy. It is sufficient to 
choose 

Px(x)= ^ x) y Vx: <*>=*., 
tp(T, m, ir x ) 

for all 7r x S X. This completes the proof. ■ 
Proof of Lemma |2} We want to count the number of different matrices X £ F" xT such that 
(X) = 7r<2 where ltd is an specific d dimensional subspace of Fj. 
We know that we can decompose X as 

X = AB, A E F" xd , B G F^ xT , 

where A and B are full rank matrices. Let us fix B such that (B) = 71^. Now for every two different 
full rank matrices A and A' we would obtain different matrices X = AB and X' = A'B such that 
X 7^ X' and (X) = (X') = 71^. So the number of different X where (X) = -Kd is equal to the number 
of full rank n x d matrices over F which is equal to nf=o (9™ — 9*)> an d we 316 done. ■ 
Proof of Lemma^s Let Pu x (^x) be the optimal input distribution of the channel Ch s with transition 
probabilities given in ©. For a fixed dimension < d < min[m,T], and an arbitrary permutation 





T 




T 




|l,2, , 


_d_ 


| -> |l, 2, , 


d 


} 









which acts on subspaces of dimension d, define P a {ir x ) as 

P Ux (o-(tTx)) if dim(7r a; ) = d, 



P*(ir x ) 



Pu x (tTz) if d\m(ir x ) / d. 



Also define P*(ir x ) = -J^ Pa{^x) where the summation is over all possible permutations. Rewriting 
the mutual information in (fT71 ) as a function of the input distribution and the transition probabilities, 
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I(Pn x (n x ), Pn Y \n x ( 7r v\ ir *))> we have 

id! ■ a- 

= I{Pn x (^x),Pu Y \IL x (^ykx)) 

where (a) is due to concavity of the mutual information with respect to the input distribution, and (b) 
holds because L(P a (n x ), Pn Y \n x (n y \n x )) = I{ p n x {n x ), Pu Y \u x {^y\^x)) for all a, since the permutation 
only permutes the terms in a summation in (fT71) . 

Note that P*(ir x ) assigns equal probabilities to all subspaces with dimension d, and the above- 
mentioned inequality shows that it is as good as the optimal input distribution. A similar argument 
holds for all < d < min[m, T]. Therefore, a dimensional-uniform distribution achieves the capacity of 
the channel. ■ 
Proof of Lemma |9} Assuming an optimal input probability distribution of the form (PT8l ). the 
probability of receiving a specific subspace Ily = -K y at the receiver can be written as 

Pn i .(%)= Yl P n Y \n x (Ky\Tr x )Pn x (Tr x ) 
Splitting the summation into two, we can write 

min[m ' T1 fl -ni Q , 

Pn.(%) = V(r,n, % ) E pri ' < 49 > 

dim(n x )=d x , 

where d y = dim^). Using the following result, Lemma [T6l we can replace the second summation in 
<@9]>. 

Lemma 16: Let n y be a fixed subspace of F^ with dimension d y . Then the number of different 
subspaces ir x £ with dimension d x , d y < d x < T, that contain n y is equal to [J"^]- 

Proof: This lemma can be proved by applying |[24l Lemma 2] with proper choice of the parameters. 
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Using Lemma [16] we can rewrite d49l ) as 



min[m,T] 



Pu Y (%) = i/)(T, n, ir y ) ^2 

min[m,T] r 



T - cL 



dx dy 



(a) lfj(T,n,TT y ) 



rr 



rri 



v, ^ 

d x =d 
min[m,T] 



\1 



E 



d x 



(50) 



where (a) follows from the following result, Lemma [T71 
Lemma 17: The following relation for the Gaussian number holds |[26l , (25 



"r 


- <V 




~T~ 




~T~ 




d x 




-dy. 




Ay. 




Ax. 




Ay. 



Now we can simplify the mutual information I (Tlx', Ily) in (fTTl ) as follows. Using ©, (fT8T ). and d50 
for I(Hx;Hy) we can write 



I(U X ;U Y )= P ^A^)Pu Y \U x (^yK) log 

min[m,T] min[n,ay 

- E E E E 



-nd x 



d x =0 cL=0 



dim(7r !C )=dx dim(7r B )=dy, 

TT^C-TTx 



[JJ 



log 2 



-rid m 



\f(dy) 



where 



min[m,T] 

w = 5^4 = 4r E 



^dy) [J] d - i L", 



(51) 



because Pn y (%) only depends on d^. Now observe that the two inner most summations depend on tv x 
and TT y only through their dimensions. So we can write 

min[m,T] min[n,d x ] rj n / -nd 



da 



ld yi 



log; 



j(n* ; n y ) = a d.<i~ nd * E 

dx=0 d y =0 

Then using Lemma 0] in §II-BI we can further simplify the mutual information and write 

min[m,T] 

I(U X ; Ily ) = - ^ a dx nd x log 2 <? 

d*=o 

min[m,T] min[n,d x ] 
- a d*Q~ ndal E ^( n >dy 



d x =0 



d v =0 



.'/J 



\Og 2 (f(dy)), 



(52) 



that is the assertion of Lemma [9] 
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Proof of Lemma \fflt By taking the partial derivative of the mutual information with respect to a^, 
we have that 

. A dI(U x ;U Y ) 



— n 



da k 

min[n,fc] 

klog 2 q- ^ ^( n i d y) 

d v =0 



L d yl 



min[m,T] min[n,d x ,k] 

^ ad * ^( n ' d y) 

d x =0 d v =0 



L dy 



^lOg, (f(dy)) 

m r ^io g2 e 



-nd m 



IZWv 



l' k = -nk\og 2 q- ^{ n ^ d y) c 



min[n,fc] 

"^ fc log 2 (/(d„)) 



T^ 1 [^(n,d y )q-^ mi « £] _ nd 

^ — f(d) — ^ ad *^] q og2e 

d v =0 J V y > d x =d v \d y \ 



f(d v ) 

min[n,fcl 

(a) 



ik\og 2 q- ^2 ^(n,d y ) 



q nk log 2 (f(d y ))-log 2 e, 



d v =Q 

where to derive (a) we use Lemma |4] in ^II-BI ■ 

Proof of Lemma |77} For convenience we rewrite ((24b again 

(min [m,T] \ 
£ <r (n ~ dyK «d. • (53) 

We prove the assertion in two steps for every k. First, let us assume that the a^'s are such that we have 
log 2 (/(min[n, k])) = o(q). Then using (f53T > one can conclude that 

min[m.,T] 
du=min[n,fc] 

so we should have on = 2~°^ for min[n, k] < i < min[m,T]. We know that < on < 1, and 

^-,imn[m,T] ^ _ ^ ^ ^ _ _ § Q w£ can d ef ] uce {hat 



log 2 (/(^)) 



°(o) j < dy < min[n, k], 
e(logg) 0<d y <j, 

where j, < j < min[n, /c], is the largest index such that ay = 0,(1). So in this case the dominating term 
in the summation of (|23T ) is the one obtained for d y = min[n, A;] because the order difference between 
each term inside the summation of (|23l is at least of order @(q). 
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Now, for the second case, let us assume that the «j's are such that we have log 2 (/(min[n, k])) = £l(q). 
We will show that this assumption leads to a contradiction. Using (I53T ) we can write 

min[m,T] 

q-( n - d y) d *a dx = 2~^ q \ 

d x =min[n,fc] 

so we should have a>i = 2~^ q > for min[n, k] < i < min[m, T]. As before, we find the asymptotic 
behavior of log 2 (f(d y )) for different values of d y but in this case we should make finer regimes for 
log 2 (f(d y )). The asymptotic behavior of on, < i < min[n, A;], is either 2~ n ^ or 2~ o( - q \ So we can 
write 

0(g) I < d y < min[n, k], 
log 2 (/K)) = I o{q) j <d y < I, 
[ 0(logg) 0<d y <j, 

where I, < I < min[n, k], is the largest index such that ol\ = 2~°^ q ' which means that on = 2"°^ for 
I < i < min [m,T]. As before j, < j < min[n, k], is the largest index such that aj = 0(1). Now we 
check the Kuhn-Tucker conditions, (|2TT) . for I' k and Ij. From the above argument we have that I' k = 0(g) 
and I'j = 0(log q). We know that ctj = 0(1) > 0, so we have I'- = ©(log q) = A. On the other hand, we 
have I' k = 0(g) < A, which is a contradiction implying the second case cannot occur. This completes 
the proof. ■ 
Proof of Lemma \12\ There are [J^J = q d i2(di-d 12 ) different choices for the intersection of m and 

n 2 . We have to choose d 2 — d\ 2 basis vectors for the rest of the subspace. This can be done in 

(„T _ n dA ( T _ „di+n ( n T _ n d 1 +d 2 -d l2 -l\ 

\1 1 ) \1 g ) ■ ■ ■ \1 g )_ ^ (d 2 -d 12 )(T-d 2 ) 

(q d * - q d ™) (q d > - q^+i) . . . [ q d 2 _ g^-i) 

ways. So we have n(di,d 2 , di 2 ) = q d ^~ d ^)+( d ^~ d ^ T ~ d ^ . The proof follows from the results in EH 
Lemma 2], by proper choice of parameters. Independently, an alternate proof of this lemma appeared in 
our paper ifTTl . ■ 
Proof of Lemma U~3\ Define tt = 7Ti + tt2, where dim(7r) = dim(-7ri) + dim(7T2) — dim(7Ti n tt 2 ) = 
d\ + d 2 — d\2 = d. The proof of this lemma is similar to that of Lemma [T2l unless we can only choose 
the last d 2 — d\ 2 basis vectors from it instead of FjT. Therefore replacing T in Lemma [12] with d, we 
have a(7ri,7r 2 ) = q d i2(di-d 12 )+(d 2 -d 12 )(d-d 2 ) _ qd 2 (d 1 -d 12 )_ _ 

Proof of Lemma \14\ Let (Ri,R 2 ) be a corner point of the region lZ co \. Since 1Z co i is the convex 
hull of a set of primitive regions, there should exist a primitive region lZ(di, d 2 ) which contains (Ri, R 2 ) 
as a corner point, i.e., 

3{d!,d 2 ) e V coh {R U R 2 ) = (i?i(di,d 2 ),-R2(di,d 2 )). 
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We will show that any point (Rx(d±, d 2 ), R 2 (di, ^2)) is dominated by the segment connecting (R\(d\ + 
1, cfe), i?2 v di + 1, ^2)) and (Ri(di,d 2 + 1), i?2(di, cfo + 1)). In order to show that, we have to prove that 
there exists some A S [0, 1], such that 

Ri(d u d 2 ) < XRxidx + 1, d 2 ) + (1 - X)Ri{d 1 ,d 2 + 1), 

R 2 (d u d 2 ) < XR 2 (d! + 1, d 2 ) + (1 - X)R 2 {d 1 ,d 2 + 1). (54) 

After a little simplification, (l54l can be rewritten as 

X[T — d\ -d 2 - 1] < di, 

(l-A)[T-di-d 2 -l] <d 2 , 

di T - 1 - di - 2d 2 

° f T — 1 — d\ — d 2 < T - 1 — d\ — d 2 

The last two inequalities can be satisfied for some choice of A if and only if d\ + d 2 < (T — l)/2. 

Therefore, if we have d\ < mi, d 2 < m 2 , and d± + d 2 < (T — l)/2 for some {d\,d 2 ) G "C C oi> then 

(di + l,d 2 ) and (di, ^2 + 1) also belong to V co \, and hence, (Ri(d\, d 2 ), R 2 {d\, d 2 )) is an interior point, 

and cannot be on the boundary of the region. Eliminating such (di,d2) from V co \, we get V. 

It is also easy to show that all of the rate pairs corresponding to (di,d2) G V are on the boundary 

of 1Z co \. This can be done by comparing the slope of the connecting segment for two consecutive points 

(according to the order they are appeared in V). The slopes are 

5{( J Ri(i,m 2 ), J R2(i,m2));( J Ri(t + l,m 2 ), R 2 {t + l,m 2 ))} 
m 2 

= — — for < t < mi 

T - 2f - m 2 - 1 ~~ 

S{ (R 1 (mi ,t),R 2 {m 1 ,t));(R 1 {m 1 ,t-l),R 2 (m u t-l))} 

T - 2t - mi - 1 

= for 1 < t < m 2 . 

mi 

It is easy to check that all the slopes are negative and they are in a decreasing order. Therefore, no point 
in the set V can be an interior point. ■ 
Proof of Lemma 177} Note that 1Z co \ ^ 7£ coop implies mi + m 2 > n. Since 1Z co \ is a convex region, 
its boundary intersects with the line Ri + R 2 = n(T — n) log 2 q in exactly two points (it cannot be only 
one point, otherwise it would be inside of 7£ coop ). It is easy to verify that the rate points corresponding 
to (di,d2) = ((n — m 2 ) + ,m.m[m 2 ,n]) and (di,d2) = (min[mi, n], {n — mi) + ) lie on both the boundary 
of lZ co \ and the line Ri + R 2 = n(T — n) log 2 q- Therefore this line cannot intersect with the boundary 
of TZ co \ in any other point. ■ 
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Appendix B 
Extension to Packet Erasure Networks 

Let us write the capacity for the erasure case as follows 

C e = max I(X;Y,N) 

Px 

= max [I(X; N) + I(X; Y\N)] 

Px 

( = ) max/(X ; y|iV) 

Px 

= maxEjv [I(X;Y)], 

Px 

where (a) follows from the independence of input distribution Px and the distribution of the number of 
received packets Pn- 
The Upper Bound: 

We can write an upper bound for C e as follows 

C e = m^E N [[I{X;Y)]} 

Px 



max I(X:Y) 

Px 



<E N 

= E N [i*(T-i*)log 2 q], 



where i* = min[m, N, [T/2\]. From here on let us assume that m < [T/2\. We thus have that i* = N 
and we can write 

C e <E N [N(T-N)log 2 q]. 
Let us define fi\ = En [N] and \ii = En [iV 2 ] so we can write 

C e < (/iiT - /i 2 ) log 2 q- 

The Lower Bound: 

For the lower bound we can write 

C e = maxE A r[[/(X;y)]] 

Px 

>E N [I(X;Y)) f0TS0msPx 

= e n [/(n x; n y )] forsomeP 
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From ( fl9l ) we know that we can write 



min[m,T] 

I(U X ; n y ) = - a d*Nd x log 2 q 



<4=0 
min[m,T] 



min[N,d x 



E ad.?""** E 



cL=0 



where 



Ci y =0 

min [m,T] r 



log 2 (/(d„)), 



M) 4 |TT E 

idyl d x =dy 

Now assume that m < [T/2\ and choose the input distribution to be = 1 for some < k < m 
and a.i = for all i ^ k. Then for this input distribution we have 



min[A r ,fc] 

I(U x ;U Y ) = -kNlog 2 q-q- kN E 

dy=0 

min[JV,fe] 

= -A;iVlog 2 g-( ? - fcJV E VW,) 

dy=0 



log 2 (/(^)) 



log 2 (/(dy))- 



Then assuming q is large we may approximate the above mutual information as follows 

min[/V,fc] 

I(U x ;U Y )^-kNlog 2 q- E q~ {N ^ ){k ~ d ^ log 2 (/(^)). 

d H =0 

The term (N — d y )(k — d y ) in the summation is maximized for d y = min[iV, k] and because we had 
shown before in Lemma [TT] that log 2 (f(d y )) = 0(logg), we can write 

I(U X ;U Y ) « -kNlog 2 q-log 2 (f(mm[N, fc])) 

« -jfeJV log 2 Q - log 2 ^min[JV,fc](fc-T)-iV^ 

= min[iV, fc](T-fc)log 2 qf. 
So by choosing k = m we can write the lower bound for C e as follows 

c e >E^ [/(n x; n y )] forsomePnx 

« E^v [JV(T - m) log 2 9 ] 
= //i (T - m) log 2 g. 
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